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BACKGROUND OF THE INVENTIQN 

5 1. FIELD OF THE INVENTION 

The present invention relates to a system for converting a scanned image to an original 
document. 



1 0 Portions of the disclosure of this patent document contain material that is subject to 

copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone 
i||f the patent document or the patent disclosure as it appears in the Patent and Trademark Office 
^lle or records, but otherwise reserves all copyright rights whatsoever. 

15 s Sun, Sun Microsystems, the Sun logo, Solaris and all Java-based trademarks and logos 
Sre trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other 
. J^^untries. All SPARC trademarks are used imder license and are trademarks of SPARC 
Jdtemational, Inc. in the United States and other countries. Products bearing SPARC trademarks 
are based upon an architecture developed by Sun Microsystems, Inc. 

20 

2. BACKGROUND ART 

Documents typically are either used electronically or they are printed out and a physical copy 
25 of the docimient is used. When a document is printed out and a physical copy is used, the electronic 
version of the document is eventually lost. The physical copy of the dociunent is often hard to 
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maintain and once the electronic copy is lost, it is hard to send the physical copy of the document to 
another person, even if it is maintained. 

One solution is to use a scanner. A scanner is a device that is configured to obtain an image 
of the document and to transform the image into a computer readable form, called a bitmap. The 
bitmap is a representation of the pattems in the original document. A bitmap, however, is 
disadvantageous because it is only a representation of the pattems in the docimient and does not 
contain letters, numbers, tables, and other information associated with the document that can be 
modified and used by either the sender or the recipient of the document. 

Optical Character Recognition 

01 

--^J To partially solve this dilemma, one solution is to use optical character recognition (OCR). 
©CK allows a user to take a physical copy of a document, to scan it using a conventional scanner, 
Slid to convert the scanned image into a text file with errors using OCR technology. To convert the 
Isbanned image to a text file, the OCR software looks at the document and attempts to determine the 

yl 

itter^ and numbers in the image. 

OCR technology, however, does not allow a user to retrieve the original document and there 
is no standard for using OCR OCR simply tries to define the appearance of letters and numbers in 
a generic way and does not account for variations in the appearance of letters and numbers when 
using different fonts. As such, OCR may or may not be successful in converting an image to a text 
file having letters, numbers, and other information. 
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SUMMARY OF THE INVENTION 



5 The present invention relates to a system for converting a scanned image into an original 

document. According to the present invention, a standard document format is defined which 
includes specific fonts, font sizes, alignment tags, tabs, margins and other formatting information 
such as table definitions and picture definitions, for instance. Then, a scaimer with the appropriate 
OCR software converts the document back to its original electronic format using the standard 
10 document format. 

o 

01 In pne embodiment of the present invention, the formatting standards are placed in the 
^cfocument by either the software that created the document or the software that converts the 
^ectronic document to a physical copy, such as a printer. In one embodiment, the formatting 
15 Siandards are marks on one side of the paper to define its alignment and other document attributes. 
kh another embodiment, the formatting standards are in the form of bar codes. 

Also the scaimer hardware / software may define the fonts which it recognizes and these 
fonts may be used in the document. In this way, the document format of the present invention is 
20 completely understood from a scanned image, and hence, it may be converted back to the original 
document. 



4 
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BRIEF DESCRIPTION OF THE DRAWINGS 

These and other features, aspects and advantages of the present invention will become better 
5 understood with regard to the following description, appended claims and accompanying drawings 
where: 

Figure 1 is a flowchart describing a system for converting a scanned image to an electronic 
docunaent according to an embodiment of the present invention. 

10 

Figure 2 is a diagram describing a system for implementing one or more embodiments of the 

present invention. 

J" 

• e" 
\i 

^ a 

fi Figure 3 is a flowchart describing a system for converting a scanned image to an electronic 

a 

1 5 ^cument according to another embodiment of the present invention. 



Q Figure 4 is a diagram of a physical version of a document according to an embodiment of 
the present invention. 



20 Figure 5 is a flowchart describing a system for converting a scanned image to an electronic 

document according to another embodiment of the present invention. 

Figure 6 is an embodiment of a computer execution environment suitable for the present 
invention. 
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DETAILED DESCRIPTION OF THE INVENTION 

The invention relates to a system for converting a scaimed image to an original document. 
5 In the following description, numerous specific details are set forth to provide a more thorough 
description of embodiments of the invention. It will be apparent, however, to one skilled in the art, 
that the invention may be practiced without these specific details. In other instances, well known 
features have not been described in detail so as not to obscure the invention. 

10 According to the present invention, a standard document format is defined which includes 

Jrmatting standards, such as specific fonts, font sizes, alignment tags, tabs, margins and other 

s.i 

gformatting information such as table definitions and picture definitions, for instance. Then, a 
"k:anner with the appropriate OCR software converts the document back to its original electronic 
tfbrmat using the standard document format. One embodiment of the present invention is shown in 
15 ©igure 1. 



At step 100, an electronic version of a document has formatting information inserted into it. 
Then, at step 110, the document is converted to a physical version which includes the formatting 
commands. Next, at step 120, the document is scanned by a scanner implementing the appropriate 
20 OCR software to interpret the formatting commands. Thereafter, at step 130, the document is 
transformed back into an electronic version by the scanner using the formatting commands. 



6 
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One embodiment of a system configured to implement the present invention is shown in the 
diagram of Figure 2. Computer system 200 is used to create an electronic version of a document 
and to insert formatting commands into the document. Then, printer 210 is used to transform the 
electronic document into a physical document 220 with the fomnatting commands in the document. 
5 Note that in one embodiment, printer 210 inserts the formatting commands rather than computer 
system 200. Next, scanner 230 is used to transform the physical document 220 into an electronic 
document again using the formatting commands and appropriate OCR software for use in computer 
system 240. In one embodiment, computer systems 200 and 240 are the same computer systenx 

10 In one embodiment of the present invention, the formatting standard is implemented by 

placing marks on one side of the paper to define its alignment and other document attributes. In 

t. = 

Iftiother embodiment of the present invention, the formatting standards are inserted into the 

yl 

Jlocument in the form of bar codes. This embodiment of the present invention is shown in Figure 

1 5 f At step 300, an electronic version of a document has formatting information in the form of 
|caie or more bar codes inserted into it, for instance when the software used to generate or print the 
dpcument is initiated. Then, at step 310, the docimient is converted to a physical version which 
includes the bar codes, for instance using a printer. Next, at step 320, the document is scanned by a 
scanner implementing the appropriate OCR software to interpret the bar codes. Thereafter, at step 

20 330, the document is transformed back into an electronic version by the scanner using the bar 
codes. 

The embodiment of the present invention where bar codes are used is shown in connection 
with the block diagram of Figure 4. In Figure 4, the physical version of the document 400 is divided 
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up into two main portions. The first portion comprises bar codes 410. The second portion 
comprises the textual and pictorial elements of the physical version 420. 



Another embodiment of the present invention is shown in Figure 5. At step 500, an 
5 electronic version of a document has specific fonts, font sizes, aligrmient tags, tabs, margins, table 
definitions, and picture definitions inserted into it. Then, at step 510, the document is converted to 
a physical version which includes the formatting commands inputted at step 500. Next, at step 520, 
the document is scanned by a scarmer implementing the appropriate OCR software to interpret the 
formatting commands. Thereafter, at step 530, the document is transformed back into an electronic 
1 0 version by the scanner using the formatting commands. 

UJ 
s I 

pi Also the scarmer hardware / software may define the fonts which it recognizes and these 
^d)nts may be used in the docimient. In this way, the document format of the present invention is 
S)mpletely understood from a scarmed image, and hence, it maybe converted back to the original 
1 5 pfocument. 

01 

Embodiment of Computer Execution Environment (Hardware) 



An embodiment of the invention can be implemented as computer software in the form of 
20 computer readable program code executed in a general purpose computing envirormient such as 
envirormient 600 illustrated in Figure 6, or in the form of bytecode class files executable within a 
Java™ run time environment nmning in such an environment, or in the form of bytecodes rurming 
on a processor (or devices enabled to process bytecodes) existing in a distributed envirormient (e.g., 
one or more processors on a network). A keyboard 610 and mouse 611 are coupled to a system bus 
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618. The keyboard and mouse arc for introducing user input to the computer system and 
communicating that user input to central processing unit (CPU) 613. Other suitable input devices 
may be used in addition to, or in place of, the mouse 611 and keyboard 610. 1/ O (input/ output) 
unit 619 coupled to bi-directional system bus 618 represents such I/O elements as a printer, A/V 
5 (audio/ video) 1/ O, etc. 



Computer 601 may include a communication interface 620 coupled to bus 618. 
Communication interface 620 provides a two-way data communication coupling via a network link 
621 to a local network 622. For example, if communication interface 620 is an integrated services 
10 digital network (ISDN) card or a modem, communication interface 620 provides a data 

jpmmunication connection to the corresponding type of telephone line, which comprises part of 

S.l 

gi^etwork link 621. If commimication interface 620 is a local area network (LAN) card, 

?= 

. . . 

^pmmunication interface 620 provides a data communication connection via network link 621 to a 
^mpatible LAN. Wireless links are also possible. In any such implementation, communication 
15 Caterface 620 sends and receives electrical, electromagnetic or optical signals which cany digital data 
^streams representing various types of information. 

Lis 

E 

Network link 621 typically provides data commiinication through one or more networks to 
other data devices. For example, network link 621 may provide a connection through local network 
20 622 to local server computer 623 or to data equipment operated by ISP 624. ISP 624 in turn 

provides data communication services through the world wide packet data communication network 
now commonly referred to as the "Internet" 625. Local network 622 and Intemet 625 both use 
electrical, electromagnetic or optical signals which cany digital data streams. The signals through the 
various networks and the signals on network link 621 and through communication interface 620, 
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which cany the digital data to and from computer 600, are exemplary forms of carrier waves 
transporting the information. 



Processor 613 may reside wholly on client computer 601 or wholly on server 626 or 
5 processor 613 may have its computational power distributed between computer 601 and server 626. 
Server 626 symbolically is represented in Figure 6 as one unit, but server 626 can also be distributed 
between multiple "tiers". In one embodiment, server 626 comprises a middle and back tier where 
application logic executes in the middle tier and persistent data is obtained in the back tier. In the 
case where processor 613 resides wholly on server 626, the results of the computations performed 
10 by processor 613 are transmitted to computer 601 via Internet 625, Intemet Service Provider (ISP) 




!4, local network 622 and communication interface 620. In this way, computer 601 is able to 



^play the results of the computation to a user in the form of output, 

^ E 
"■£ 

S.l 

Q Computer 601 includes a video memory 614, main memory 615 and mass storage 612, all 

15 @i)upled to bi-directional system bus 618 along with keyboard 610, mouse 611 and processor 613. 

■ 

with processor 613, in various computing environments, main memory 615 and mass storage 
B12, can reside wholly on server 626 or computer 601, or they may be distributed between the two. 
Examples of systems where processor 613, main memory 615, and mass storage 612 are distributed 
between computer 601 and server 626 include the thin-client computing architecture developed by 
20 Sun Microsystems, Inc., the palm pilot computing device and other personal digital assistants, 

Intemet ready cellular phones and other Intemet computing devices, and in platform independent 
computing environments, such as those which utilize the Java technologies also developed by Sun 
Microsystems, Inc. 
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The mass storage 612 may include both fixed and removable media, such as magnetic, 
optical or magnetic optical storage systems or any other available mass storage technology. Bus 618 
may contain, for example, thirty-two address lines for addressing video memory 614 or main 
memory 615. The system bus 618 also includes, for example, a 32- bit data bus for transferring data 
5 between and among the components, such as processor 613, main memory 615, video memory 614 
and mass storage 612. Alternatively, multiplex data/ address lines may be used instead of separate 
data and address lines. 

In one embodiment of the invention, the processor 613 is a microprocessor manufactured 
10 by Motorola, such as the 680X0 processor or a microprocessor manufactured by Intel, such as the 

^X86, or Pentium processor, or a SPARC microprocessor from Sun Microsystems, Inc. However, 

"-^J ■ . . . ■ . . . 

gqiy other suitable microprocessor or microcomputer nuy be utilized. Main memory 615 is 

'^cpmprised of dynamic random access memory (DRAM). Video memory 614 is a dual-ported video 

:i 

yindom access memory. One port of the video memory 614 is coupled to video amplifier 616. The 

^ 

15 S3deo amplifier 616 is used to drive the cathode ray tube (CRT) raster monitor 617. Video amplifier 

bl6 is well known in the art and maybe implemented by any suitable apparatus. This circuitry 

pi 

|^6nverts pixel data stored in video memory 614 to a raster signal suitable for use by monitor 617. 
Monitor 617 is a type of monitor suitable for displaying graphic images. 

20 

Computer 601 can send messages and receive data, including program code, through the 
network(s), network link 621, and communication interface 620. In the Internet example, remote 
server computer 626 might transmit a requested code for an application program through Intemet 
625, ISP 624, local network 622 and communication interface 620. The received code may be 

11 
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executed by processor 613 as it is received, and/ or stored in mass storage 612, or other non- volatile 
storage for later execution. In this manner, computer 600 may obtain application code in the form 
of a carrier wave. Altematively, remote server computer 626 may execute appUcations using 
processor 613, and utilize mass storage 612, and/ or video memory 615. The results of the execution 
5 at server 626 are then transmitted through Internet 625, ISP 624, local network 622 and 

communication interface 620. In this example, computer 601 performs only input and output 
functions. 



Application code may be embodied in any form of computer program product. A computer 
10 program product comprises a medium configured to store or transport computer readable code, or 
which computer readable code maybe embedded. Some examples of computer program 
qSroducts are CD-ROM disks, ROM cards, floppy disks, magnetic tapes, computer hard drives, 
'Sfervers on a network, and carrier waves. 



Cj 

3 



15 The computer systems described above are for purposes of example only. An embodiment 

the invention may be implemented in any type of computer system or programming or 
pfocessing envu"onment. 

Thus, a system for converting a scanned image to an original document is described in 
20 conjunction with one or more specific embodiments. The invention is defined by the claims and 
their full scope of equivalents. 
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